Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microdata Support in EM translator #1660

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

zuphilip
Copy link
Contributor

This superseeds #1068 and support for microdata was asked in #366. Now, with the schema.org ready in RDF translator, I tried to integrate the microdata functionalities into EM translators and adjusted them a little. Currently, this already works on a few examples I used. More testing is needed and any comment is welcome.

@zuphilip zuphilip changed the title [WIP] Microdata Microdata Support in EM translator May 20, 2018
@zuphilip zuphilip requested a review from adam3smith May 20, 2018 15:40
@adam3smith
Copy link
Collaborator

Great! I´ll review this bit-by-bit. One thing I'm concerned about in general terms is whether we're expecting microdata for multiple items on a page. My understanding of the format is that when I, e.g., cite a book in an article I could use this to embed citation information for the book.
I guess three questions:

  1. Am I understanding that correctly?
  2. If so, is that actually done to any meaningful degree?
  3. If both 1. and 2., how are we thinking of handing that?

@zuphilip
Copy link
Contributor Author

Yes, that is possible (but I haven't seen citations encoded in microdata so far)

My test cases for developing this translator were the translators already using itemprop somewhere, i.e. https://github.com/zotero/translators/search?utf8=%E2%9C%93&q=itemprop&type=. All of them should work fine with the current code and the microdata should improve some data. There is one exception for https://www.kitapyurdu.com/kitap/makroekonomi/139156.html where indeed several publication items are described in that page (main entry and related items). I categorized this as an exception and was happy that it works fine for all the other cases.

The detection of EM is not changed and therefore the microdata does not influence the detection in EM. The cases were it could go worse than before are therefore cases were we already now detect a single item but the microdata there now correspond to multiple items.

What may be really tricky is the creation of the itemid. The main item has the url as the itemid and other items (e.g. person) have another itemid. We could try to create different itemids for multiple publications which would automatically lead to different items outputed in the doWeb. But can we decide this in a generic way?

(There are two debug statements which you can comment out, to see some more information about the statements and the types.)

@zuphilip
Copy link
Contributor Author

This needs further testing. I see some strange behavior e.g. in Springer translator now, which depends on EM.

@mrtcode
Copy link
Member

mrtcode commented Jan 4, 2019

@zuphilip interesting what was the problem with Springer? I'm trying to understand what potential issues we can possibly encounter by adding support for more metadata types to EM translator.

@zuphilip
Copy link
Contributor Author

@mrtcode Sorry, I can't remember. But one can try to check out the branch and test the detection/extraction with the Springer website again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

3 participants